YouCat: Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia

نویسندگان

  • Subhabrata Mukherjee
  • Pushpak Bhattacharyya
چکیده

In this paper, we propose a weakly supervised system, YouCat, for categorizing Youtube videos into different genres like Comedy, Horror, Romance, Sports and Technology The system takes a Youtube video url as input and gives it a belongingness score for each genre. The key aspects of this work can be summarized as: (1) Unlike other genre identification works, which are mostly supervised, this system is mostly unsupervised, requiring no labeled data for training. (2) The system can easily incorporate new genres without requiring labeled data for the genres. (3) YouCat extracts information from the video title, meta description and user comments (which together form the video descriptor). (4) It uses Wikipedia and WordNet for concept expansion. (5) The proposed algorithm with a time complexity of O(|W|) (where (|W|) is the number of words in the video descriptor) is efficient to be deployed in web for real-time video categorization. Experimentations have been performed on real world Youtube videos where YouCat achieves an F-score of 80.9%, without using any labeled training set, compared to the supervised, multiclass SVM F-score of 84.36% for single genre prediction. YouCat performs better for multi-genre prediction with an F-Score of 90.48%. Weak supervision in the system arises out of the usage of manually constructed WordNet and genre description by a few root words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Wikipedia for Hierarchical Finer Categorization

Wikipedia is one of the largest growing structured resources on the Web and can be used as a training corpus in natural language processing applications. In this work, we present a method to categorize named entities under the hierarchical fine-grained categories provided by the Wikipedia taxonomy. Such a categorization can be further used to extract semantic relations among these named entitie...

متن کامل

UCD-S1: A hybrid model for detecting semantic relations between noun pairs in text

We describe a supervised learning approach to categorizing inter-noun relations, based on Support Vector Machines, that builds a different classifier for each of seven semantic relations. Each model uses the same learning strategy, while a simple voting procedure based on five trained discriminators with various blends of features determines the final categorization. The features that character...

متن کامل

Model-Portability Experiments for Textual Temporal Analysis

We explore a semi-supervised approach for improving the portability of time expression recognition to non-newswire domains: we generate additional training examples by substituting temporal expression words with potential synonyms. We explore using synonyms both from WordNet and from the Latent Words Language Model (LWLM), which predicts synonyms in context using an unsupervised approach. We ev...

متن کامل

WikiSent: Weakly Supervised Sentiment Analysis through Extractive Summarization with Wikipedia

This paper describes a weakly supervised system for sentiment analysis in the movie review domain. The objective is to classify a movie review into a polarity class, positive or negative, based on those sentences bearing opinion on the movie alone. The irrelevant text, not directly related to the reviewer opinion on the movie, is left out of analysis. Wikipedia incorporates the world knowledge ...

متن کامل

Weakly Supervised Learning of Object Segmentations from Web-Scale Video

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as “dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012